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EFFICIENT MONTE CARLO FOR HIGH EXCURSIONS OF 
GAUSSIAN RANDOM FIELDS 

By Robert J. Adler* Jose H. Blanchet'I' and Jingchen Liu* 
Technion, Columbia, Columbia 

f^^ ' Our focus is on the design and analysis of efficient Monte Carlo meth- 

*vj ' ods for computing tail probabilities for the suprcma of Gaussian random 

fields, along with conditional expectations of functionals of the fields given 
C . the existence of excursions above high levels, b. Naive Monte Carlo takes 

^2 ■ an exponential, in 6, computational cost to estimate these probabilities and 

conditional expectations for a prescribed relative accuracy. In contrast, our 

vQ ' Monte Carlo procedures achieve, at worst, polynomial complexity in 6, as- 

suming only that the mean and covariance functions are Holder continuous. 
We also explain how to fine tune the construction of our procedures in the 
presence of additional regularity, such as homogeneity and smoothness, in 

m^ • order to further improve the efficiency. 

T^ • 1. Introduction. This paper centers on the design and analysis of efficient 

C^ I Monte Carlo techniques for computing probabilities and conditional expectations 

related to high excursions of Gaussian random fields. More specifically, suppose 
that /: Txil— >Risa continuous Gaussian random field over a d dimensional 
compact set T C K.''. (Additional regularity conditions on T will be imposed below, 
as needed.) 

Our focus is on tail probabilities of the form 



> 

(N 



OO. (1.1) «;(6)=P(max/(i)>6) 

o ■ 

ly^ . and on conditional expectations 

o 

O: (1-2) E[r(/)|max/(t)>6], 






as 6 — )■ cx), where F is a functional of the field, which, for concreteness we take to 
be positive and bounded. 
, . , While much of the paper will concentrate on estimating the exceedence probabil- 

C^ ' ity (1.1), it is important to note that our methods, based on importance sampling, 

are broadly applicable to the efficient evaluation of conditional expectations of the 
form (1.2). Indeed, as we shall explain at the end of Section 4, our approach to ef- 
ficient importance sampling is based on a procedure which mimics the conditional 
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distribution of /, given that niax^ f{t) > b. Moreover, once an efficient (in a pre- 
cise mathematical sense described in Section 2) importance samphng procedure is 
in place, it follows under mild regularity conditions on T that an efficient estimator 
for (1.2) is immediately obtained by exploiting an efficient estimator for (1.1). 

The need for an efficient estimator of uj{b) should be reasonably clear. Suppose 
one could simulate 

r = sup/(t) 

exactly (i.e. without bias) via naive Monte Carlo. Such an approach would typically 
require a number of replications of /* which would be exponential in b^ to obtain 
an accurate estimate (in relative terms). Indeed, since in great generality (see [22]) 
w{b) = exp(— c6^ + o(6^)) as 6 — >■ oo for some c G (0, oo), it follows that the average 
of n i.i.d. Bernoulli trials each with success parameter w{b) estimates w{b) with 
a relative mean squared error equal to n~^/^(l — i(j(6))^/^/w(6)^/^. To control the 
size of the error therefore requires^ n = n{w{b)~^), which is typically prohibitively 
large. In addition, there is also a problem in that typically /* cannot be simulated 
exactly and that some discretization of / is required. 

Our goal is to introduce and analyze simulation estimators that can be applied 
to a general class of Gaussian fields and that can be shown to require at most a 
polynomial number of function evaluations in b to obtain estimates with a prescribed 
relative error. The model of computation that we use to count function evaluations 
and the precise definition of an algorithm with polynomial complexity is given in 
Section 2. Our proposed estimators are, in particular, asymptotically optimal. (This 
property, which is a popular notion in the context of rare-event simulation (cf. 
[6, 12]) essentially requires that the second moments of estimators decay at the same 
exponential rate as the square of the first moments.) The polynomial complexity 
of our estimators requires to assume no more than that the underlying Gaussian 
field is Holder continuous (see Theorem 3.1 in Section 3). Therefore, our methods 
provide means for efficiently computing probabilities and expectations associated 
with high excursions of Gaussian random fields in wide generality. 

In the presence of enough smoothness, we shall also show how to design estima- 
tors that can be shown to be strongly efficient, in the sense that their associated 
coefficient of variation remains uniformly bounded as 6 — ^ cx3. Moreover, the associ- 
ated path generation of the conditional field (given a high excursion) can, basically, 
be carried out with the same computational complexity as the unconditional sam- 
pling procedure (uniformly in b). This is Theorem 3.3 in Section 3. 

High excursions of Gaussian random fields appear in wide number of applications, 
including, but not limited to, 

• Physical oceanography: Here the random field can be water pressure or surface 
temperature. See [3] for many examples. 

• Gosmology: This includes the analysis of COBE and WMAP microwave data 
on a sphere or galactic density data. e.g. [8, 20, 21]. 



^Given h and g positive, we shall use the familiar asymptotic notation h(x) = 0{g{x)) if there 
is c < oo such that h{x) < cg{x) for all x large enough; h{x) = Q{g{x)) if h{x) > cg{x) if x is 
sufficiently large and h{x) = o{g{x)) as x — > oo if h{x)/g{x) — > as a; — > oo; and h{x) = 0{g{x)) 
if h{x) = 0{g{x)) and h{x) = n{g{x)). 
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• Quantum chaos: Here random planar waves replace deterministic (but unob- 
tainable) solutions of Schrodinger equations, e.g. the recent review [13]. 

• Brain mapping: This application is the most developed and very widely used. 
For example, the paper by Friston et al [15] that introduced random field 
methodology to the brain imaging community has, at the time of writing, 
over 4,500 (Google) citations. 

Many of these applications deal with twice differentiable, constant variance ran- 
dom fields, or random fields that have been normalized to have constant variance, 
the reason being that they require estimates of the tail probabilities (1.1) and these 
are only really well known in the smooth, constant (unit) variance case. In partic- 
ular, it is known that, with enough smoothness assumptions, 

(1.3) \nnmi-b-Hog\P{snpf{t)>b)-E{x{{teT: f{t)>b}))\ > 1 + ^, 

where x(^) is the Euler characteristic of the set A and the term a^ is related 
to a geometric quantity known as the critical radius of T and depends on the 
covariance structure of /. See [4, 23] for details. Since both the probability and the 
expectation in (1.3) are typically 0(6^ exp(— 6^/2) for some ^ > and large b, a 
more user friendly (albeit not quite as correct) way to write (1.3) is 

(1.4) P(sup/(t) > 6) « E{x{{t e T : /(i) > h})) x (l + ©(e"^"')), 

teT 

for some c. 

The expectation in (1.3) and (1.4) has an explicit form that is readily computed 
for Gaussian and Gaussian-related random fields of constant variance (see [4, 5] for 
details), although if T is geometrically complicated or the covariance of / highly 
non-stationary there can be terms in the expectation that can only be evaluated 
numerically or estimated statistically, (e.g. [1, 24]). Nevertheless, when available, 
(1.3) provides excellent approximations and simulation studies have shown that 
the approximations are numerically useful for quite moderate 6, of the order of 2 
standard deviations. 

However, as we have already noted, (1.3) holds only for constant variance fields, 
which also need to be twice differentiable. In the case of less smooth /, other classes 
of results occur, in which the expansions are less reliable and, in addition, typically 
involve the unknown Pickands' constants (cf. [7, 19]). 

These are some of the reasons why, despite a well developed theory, Monte-Carlo 
techniques still have a significant role to play in understanding the behavior of Gaus- 
sian random fields at high levels. The estimators proposed in this paper basically 
reduce the rare event calculations associated to high excursions in Gaussian random 
fields to calculations that are roughly comparable to the evaluation of expectations 
or integrals in which no rare event is involved. In other words, the computational 
complexity required to implement the estimators discussed here is similar in some 
sense to the complexity required to evaluate a given integral in finite dimension or 
an expectation where no tail parameter is involved. To the best of our knowledge 
these types of reductions have not been studied in the development of numerical 
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methods for high excursions of random fields. This feature distinguishes the present 
work from the appheation of other numerical techniques that are generic (such as 
quasi-Monte Carlo and other numerical integration routines) and that in particular 
might be also applicable to the setting of Gaussian fields. 

Contrary to our methods, which are designed to have provably good performance 
uniformly over the level of excursion, a generic numerical approximation procedure, 
such as quasi-Monte Carlo, will typically require an exponential increase in the 
number of function evaluations in order to maintain a prescribed level of relative 
accuracy. This phenomenon is unrelated to the setting of Gaussian random fields. In 
particular, it can be easily seen to happen even when evaluating a one dimensional 
integral with a small integrand. On the other hand, we believe that our estimators 
can, in practice, be easily combined with quasi-Monte Carlo or other numerical 
integration methods. The rigorous analysis of such hybrid approaches, although of 
great interest, requires an extensive treatment and will be pursued in the future. 
As an aside, we note that quasi-Monte Carlo techniques have been used in the 
excursion analysis of Gaussian random fields in [7] . 

The remainder of the paper is organized as follows. In Section 2 we introduce 
the basic notions of polynomial algorithmic complexity, which are borrowed from 
the general theory of computation. Section 3 discusses the main results in light of 
the complexity considerations of Section 2. Section 4 provides a brief introduction 
to importance sampling, a simulation technique that we shall use heavily in the 
design of our algorithms. The analysis of finite fields, which is given in Section 5, 
is helpful to develop the basic intuition behind our procedures for the general case. 
Section 6 provides the construction and analysis of a polynomial time algorithm for 
high excursion probabilities of Holder continuous fields. Finally, in Section 7, we 
add additional smoothness assumptions along with stationarity and explain how to 
fine tune the construction of our procedures in order to further improve efHciency 
in these cases. 

2. Basic Notions of Computational Complexity. In this section we shall 
discuss some general notions of efficiency and computational complexity related to 
the approximation of the probability of the rare events {Bf, '■ b > bg}, for which 
P (Bf,) — > as 6 — )> oo. In essence, efhciency means that computational complexity 
is, in some sense, controllable, uniformly in b. A notion that is popular in the 
efficiency analysis of Monte Carlo methods for rare events is weak efficiency (also 
known as asymptotic optimality) which requires that the coefficient of variation of 
a given estimator, Li, of P (i?f,), to be dominated by 1/P [B^Y for any e > 0. More 
formally, we have 

Definition 2.1. A family of estimators {Li, : b > bo} is said to be polynomially 
efficient for estimating P (Bf,) if E{Li,) = P {Bif) and there exists a q € (0, c») for 
which 



Moreover, if (2.1) holds with q ~ 0, then the family is said to be strongly efficient. 



(2.1) sup 5 < oo. 

b>bo[P{B,)]'\\ogP{B,)\'' 
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Below wc often refer to Lf, as a strongly (polynomially) efficient estimator, by 
which we mean that the family {Lf, : fe > 0} is strongly (polynomially) efficient. 
In order to understand the nature of this definition let {Ljj'' , I < j < n} be a 
collection of i.i.d. copies of L},. The averaged estimator 

has a relative mean squared error equal to [Var{Li,)]^''^ /[n^''^P{Bi,)]. A simple 
consequence of Chebyshev's inequality is that 



P 



(|Z„(6)/P(Bb 



1 



>e)< ^«^(^^) 



e^nP[{B,)Y 



Thus, if Lb is polynomially efficient and we wish to compute P {Bh) with at most 
£ relative error and at least 1 — 6 confidence, it suffices to simulate 

n^eie'^S-^llogPiBt)]") 

i.i.d. replications of Lf,. In fact, in the presence of polynomial efficiency, the bound 
n = e(e-2^-i|logP(Bf,)|«) can be boosted to n = e {£-^log{S-^)\logP{Bb)\'') 
using the so-called median trick (see [18]). 

Naturally, the cost per replication must also be considered in the analysis, and 
we shall do so, but the idea is that evaluating P {Bb) via crude Monte Carlo would 
require, given e and 5,n = Q {1/ P (Bb)) replications. Thus a polynomially efficiently 
estimator makes the evaluation of P (Bb) exponentially faster relative to crude 
Monte Carlo, at least in terms of the number of replications. 

Note that a direct application of deterministic algorithms (such as quasi Monte 
Carlo or quadrature integration rules) might improve (under appropriate smooth- 
ness assumptions) the computational complexity relative to Monte Carlo, although 
only by a polynomial rate (i.e. the absolute error decreases to zero at rate n~P for 
p > 1/2, where n is the number of function evaluations and p depends on the di- 
mension of the function that one is integrating; see for instance [6]). We believe that 
the procedures that we develop in this paper can guide the construction of efficient 
deterministic algorithms with small relative error and with complexity that scales 
at a polynomial rate in jlogP (i?f,)|. This is an interesting research topic that we 
plan to explore in the future. 

An issue that wc shall face in designing our Monte Carlo procedure is that, due 
to the fact that / will have to be discrctized, the corresponding estimator Lb will 
not be unbiased. In turn, in order to control the relative bias with an eifort that is 
comparable to the bound on the number of replications discussed in the preceeding 
paragraph, one must verify that the relative bias can be reduced to an amount 
less than s with probability at least 1 — (5 at a computational cost of the form 
0(£-«'' \\og P{Bb)\'''). If Lb{e) can be generated with 0(e-* \\ogP{Bb)D cost, 
and satisfying P {Bb) — ELb{£) < eP {Bb)., and if 

Var{Lb{e)) 

sup ^ < oo 

b>oP{Bb)^\\ogP{Bb)\'^ 
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for some q e (0,oo), then L[^ {b,e) ~ X^iLi ^h i^)!"^-! where the L^\£y% are i.i.d. 
copies of ib(e), satisfies 

e^ X n X P (_Bf,) 

Consequently, taking n = (e^^(5^^ |logP (Bb)!') suffices to give an estimator with 
at most £ relative error and 1 — 5 confidence, and the total computational cost is 

e(£-2-9o^-l|logP(5^)|9+9iy 

We shall measure computational cost in terms of function evaluations such as a 
single addition, a multiplication, a comparison, the generation of a single uniform 
random variable on T, the generation of a single standard Gaussian random variable 
and the evaluation of $(x) for fixed x > 0, where 






All of these function evaluations are assumed to cost at most a fixed amount c of 
computer time. Moreover, we shall also assume that first and second order moment 
characteristics of the field, such as /i (t) ~ Ef (t) and C (s, t) — Cov (/ (i) , / (s)) 
can be computed in at most c units of computer time for each s,t (^ T. We note that 
similar models of computation arc often used in the complexity theory of continuous 
problems, see [25]. 

The previous discussion motivates the next definition which has its roots in the 
general theory of computation in both continuous and discrete settings, [17, 25]. 
In particular, completely analogous notions in the setting of complexity theory of 
continuous problems lead to the notion of "tractability" of a computational problem 
[28]. 

Definition 2.2. A Monte Carlo procedure is said to be a fully polynomial 
randomized approximation scheme (FPRAS) for estimating P {Bh) if for some 
q, 91, 92 G [0, oo), it outputs an averaged estimator that is guaranteed to have at most 
e > relative error with confidence at least l—S € (0, 1) in &{£~''^S~'^^ \ logP (Bb) \'') 
function evaluations. 

The terminology adopted, namely FPRAS, is borrowed from the complexity 
theory of randomized algorithms for counting, [17]. Many counting problems can 
be expressed as rare event estimation problems. In such cases it typically occurs 
that the previous definition (expressed in terms of a rare event probability) coincides 
precisely with the standard counting definition of a FPRAS (in which there is no 
reference to any rare event to estimate). This connection is noted, for instance, in 
[10]. Our terminology is motivated precisely by this connection. 

By letting Bb = {/* > b}, the goal in this paper is to design a class of fully poly- 
nomial randomized approximation schemes that are applicable to a general class of 
Gaussian random fields. In turn, since our Monte Carlo estimators will be based on 
importance sampling, it turns out that we shall also be able to straightforwardly 
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construct FPRAS's to estimate quantities such as E[r {f)\ supf^j^ f{t) > b] for a 
suitable class of functional F for which F (/) can be computed with an error of 
at most e with a cost that is polynomial as function of e~^. We shall discuss this 
observation in Section 4, which deals with properties of importance sampling. 

3. Main Results. In order to state and discuss our main results we need some 
notation. For each s,t E T define 

fi{t) ^ E{f{t)), C{s,t)^Cov{f{s)J{t)), a^{t)^C{t,t)>0, r{s,t) ^ ^^"'^^ 



ais)ait)- 



Moreover, given x e R"* and /3 > we write \x\ = J2i=i l^jL where Xj is the j-th 
component of x. We shall assume that, for each fixed s,t Cz T, both fi (t) and C (s, t) 
can be evaluated in at most c units of computing time. 

Our first result shows that under modest continuity conditions on fi, a and r it 
is possible to construct an explicit FPRAS for w{b) under the following regularity 
conditions: 

Al: The field / is almost surely continuous on T; 

A2: For some S > and |s — t| < 6, the mean and variance functions satisfies 

\a (t) - a (s)| + \^i (t) -^iis)\< ^h\s - tf; 

A3: For some S > and |s — s'| < (5, |i — i'| < S the correlation function of / 

\r{t,s)-r{t',s')\<^H[\t-t'f + \s-s'n 
A4: £ T. There exist kq and Ud such that, for any i e T and e small enough, 

m{B{t,e)nT) > ko£%. 

where m is the Lebesgue measure, B{t,e) — {s : \t ~ s\ < e} and ujd = 
m(B(0, 1)). 

The assumption that € T is of no real consequence and is adopted only for 
notational convenience. 

Theorem 3.1. Suppose that f : T — > R is a Gaussian random field satisfying 
Conditions AI-A4 above. Then, Algorithm 6.1 provides a FPRAS for w (b) . 

The polynomial rate of the intrinsic complexity bound inherent in this result is 
discussed in Section 6, along with similar rates in results to follow. The conditions of 
the previous theorem are weak, and hold for virtually all applied settings involving 
continuous Gaussian fields on compact sets. 

Not surprisingly, the complexity bounds of our algorithms can be improved upon 
under additional assumptions on /. For example, in the case of finite fields (i.e. when 
T is finite) with a non-singular covariance matrix we can show that the complexity 
of the algorithm is actually bounded as b /^ 00. We summarize this in the next 
result, whose proof, which is given in Section 5, is useful for understanding the 
main ideas behind the general procedure. 
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Theorem 3.2. Suppose that T is a finite set and f has a non-singular covari- 
ance matrix over T x T. Then Algorithm 5.3 provides a FPRAS with q = 0. 

As wc indicated above, the strategy behind the discrete case provides the basis 
for the general case. In the general situation, the underlying idea is to discretize the 
field with an appropriate sampling (discretization) rule that depends on the level 
b and the continuity characteristics of the field. The number of sampling points 
grows as b increases, and the complexity of the algorithm is controlled by finding 
a good sampling rule. There is a trade off between the number of points sampled, 
which has a direct impact on the complexity of the algorithm, and the fidelity 
of the discrete approximation to the continuous field. Naturally, in the presence of 
enough smoothness and regularity, more information can be obtained with the same 
sample size. This point is illustrated in the next result, Theorem 3.3, which considers 
smooth, homogeneous fields. Note that in addition to controlling the error induced 
by discretizing the field, the variance is strongly controlled and the discretization 
rule is optimal, in a sense explained in Section 7. For Theorem 3.3 we require the 
following additional regularity conditions. 

Bl: /is homogeneous and almost surely twice continuously differentiablc. 

B2; G T C W^ is a d-dimensional convex set with nonempty interior. Denoting 
its boundary by dT, assume that dT is a (d— l)-dimensional manifold without 
boundary. For any t (^T, assume that there exists Kg > such that 

m{B{t,e)nT) > kqe'^, 

for any e < 1, where m is Lebesgue measure. 

Theorem 3.3. Let f satisfy conditions B1-B2. Then Algorithm 7.3 provides a 
FPRAS. Moreover, the underlying estimator is strongly efficient and there exists a 
discretization scheme for f which is optimal in the sense of Theorem 7.4- 

4. Importance Sampling. Importance sampling is based on the basic iden- 
tity, for fixed measurable B, 

(4.1) P(B)= j t{Lu(^B)dP{uj)^ f liLueB)^iuj)dQiuj), 

where we assume that the probability measure Q is such that Q {■ D B) is abso- 
lutely continuous with respect to the measure P {■ Ci B). If we use E'^ to denote 
expectation under Q, then (4.1) trivially yields that the random variable 

dP 

L{u)^1{lo(^B) — (w) 

is an unbiased estimator for P (B) > under the measure Q, or, symbolically. 

eQl = p{b). 

An averaged importance sampling estimator based on the measure Q, which is 
often referred as an importance sampling distribution or a change-of-measure, is 
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obtained by simulating n i.i.d. copies L^^' , ..., L^"' of L under Q and computing the 
empirical average L„ = (L^^^ + ... + L'-"^)/n. A central issue is that of selecting Q in 
order to minimize the variance of i„. It is easy to verify that if Q*{-) = P{-\B) = 
P{-r\B)/P{B) then the corresponding estimator has zero variance. However, Q* is 
clearly a change of measure that is of no practical value, since P (B) ~ the quantity 
that we are attempting to evaluate in the first place - is unknown. Nevertheless, 
when constructing a good importance sampling distribution for a family of sets 
{Bf, : & > 60} for which < P (Bf,) — > as fe — > 00, it is often useful to analyze the 
asymptotic behavior of Q* as P (Bh) — > in order to guide the construction of a 
good Q. 

We now describe briefiy how an efficient importance sampling estimator for 
P (Bh) can also be used to estimate a large class of conditional expectations given 
B\,. Suppose that a single replication of the corresponding importance sampling 
estimator, 

U = 1 (w e Bb) dP/dQ, 

can be generated in O {\og\P {Bh)\'^°) function evaluations, for some q^ > 0, and 
that 



Var{Lh) = 0{[P{Bh)f\og\P{Bb)\ 



ila\ 



These assumptions imply that by taking the average of i.i.d. replications of L^ we 
obtain a FPRAS. 

Fix /? e (0, 00) and let X {(3, q) be the class of random variables X satisfying 
0<X<I3 with 

(4.2) E[X\Bh]=n[l/\og{P{Bh)n 

Then, by noting that 

it follows easily that a FPRAS can be obtained by constructing the natural estima- 
tor for -^[Arl Bh]] i.e. the ratio of the corresponding averaged importance sampling 
estimators suggested by the ratio in the left of (4.3). Of course, when X is difficult 
to simulate exactly, one must assume the bias E[X;Bh] can be reduced in poly- 
nomial time. The estimator is naturally biased but the discussion on FPRAS on 
biased estimators given in Section 2 can be directly applied. 

In the context of Gaussian random fields, we have that Bh = {f* > b} and 
one is very often interested in random variables X of the form X ~ T (/), where 
r : C (T) — )• M and C{T) denotes the space of continuous functions on T. Endowing 
C(T) with the uniform topology, consider functions F that are non-negative and 
bounded by a positive constant. An archetypical example is given by the volume 
of (conditioned) high level excursion sets with /? = rn{T) is known to satisfy (4.2). 
However, there are many other examples of X (/?, q) with /3 = rn{T) which satisfy 
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(4.2) for a suitable q, depending on the regularity properties of the field. In faet, if 
the mean and covariance properties of / are Holder eontinuous, then, using similar 
techniques as those given in the arguements of Seetion 6, it is not difficult to see 
that q can be estimated. In turn we have that a FPRAS based importance sam- 
pling algorithm for w (6) would typically also yield a polynomial time algorithm for 
functional characteristics of the conditional field given high level excursions. Since 
this is a very important, and novel, application, we devote the remainder of this 
paper to the development of efficient importance sampling algorithms for w(b). 

5. The Basic Strategy: Finite Fields. In this section we develop our main 
ideas in the setting in which T is a finite set of the form T = {t^, ...,tM}- To 
emphasize the discrete nature of our algorithms in this section, we write Xi — f (ti) 
for 1,...,M and set X = {Xi, ...,Xm)- This section is mainly of an expository 
nature, since much of it has already appeared in [2]. Nevertheless, it is included 
here as a useful guide to the intuition behind the algorithms for the continuous 
case. 

We have already noted that in order to design an eflficient importance sampling 
estimator for w (b) = P (maxi<i<M Xi > b) it is useful to study the asymptotic 
conditional distribution of X, given that maxi<i<M^i > b. Thus, we begin with 
some basic large deviation results. 

Proposition 5.1. For any set of random variables Xi, . . . , Xm , 

M 



max P (X, >b) < P i max X, > b] < V P (X,- > 6) . 

l<i<M \l<i<M J ^-^ ■' 

Moreover, if the Xj are mean zero, Gaussian, and the correlation between Xi and 
Xj is strictly less than 1, then 

P {X, > 6, Xj >b) = o {max[P{X, > b), P{Xj > b)]) . 

Thus, if the associated covariance matrix of X is non-singular, 

M 

w{b) = {l + o{l))Y,P{X,>b). 



Proof. The lower bound in the first display is trivial, and the upper bound follows 
by the union bound. The second display follows easily by working with the joint 
density of a bivariate Gaussian distribution (e.g. [9, 16]) and the third claim is a 
direct consequence of the inclusion-exclusion principle. □ 

As noted above, Q* corresponds to the conditional distribution of X given that 
X* = maxi<i<M Xi > b. It follows from Proposition 5.1 that, conditional on X* > 
b, the probability that two or more Xj exceed b is negligible. Moreover, it also 
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follows that 

P{X,^X*\X*>b) = {l + o{l)) ^(-^»>^) . 

The following corollary now follows as an easy consequence of these observations. 

Corollary 5.2. 

dry (Q*, 2)^0 

as 6 —> oo, where dxy denotes the total variation norm and Q is defined, for Borel 
B C K^^ as 

M 

Q{x e B) = ^p{i,b) P[x e B\x, >hl 

where 

P{X,>h) 

P (*' b) = ^^M 



Proof. Pick an arbitrary Borcl B. Then we have that 

n*f^^i,^ PJ^ € B, maxi<,<M X, > b] ^ ^ P[X £ 5, X, > b] 

^ (^^^)= ^;^6r -h — ^^ — 

i—l 

The above, which follows from the union bound and the last part of Proposition 
5.1 combined with the definition of Q, yields that for each e > there exists bo 
(independent of B) such that, for all b > bo, 

Q*{XeB)<Q{XeB)/{l~e). 

The lower bound follows similarly, using the inclusion-exclusion principle and the 
second part of Proposition 5.1. □ 

Corollary 5.2 provides support for choosing Q as an importance sampling dis- 
tribution. Of course, wc still have to verify that the corresponding algorithm is a 
FPRAS. The importance sampling estimator induced by Q takes the form 

Note that under Q we have that X* > b almost surely, so the denominator in the 
expression for Li, is at least 1. Therefore, we have that 

E^Ll<lY^P{X,>b)\ , 
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and by virtue of Proposition 5.1 wc conclude (using VavQ to denote the variance 
under Q) that 

VavQ (Lb) 



P{X* > bf 



0, 



as 6 — >■ oo. In particular, it follows that Lb is strongly efficient. 
Our proposed algorithm can now be summarized as follows. 

Algorithm 5.3. There are two steps in the algorithm: 

Step 1: Simulate n i.i.d. copies X^^\ . . . ,X^^' of X from the distribution Q. 
Step 2: Compute and output 






i=l 

Where L« = E^Ii Pi^f > b)/E'U ^ixf > b). 

Since the generation of Li under Q takes O (M^) function evaluations we con- 
clude, based on the analysis given in Section 2, that Algorithm 5.3 is a FPRAS 
with 9 = 0. This implies Theorem 3.2, as promised. 

6. A FPRAS for Holder Continuous Gaussian Fields. In this section 
we shall describe the algorithm and the analysis behind Theorem 3.1. Throughout 
the section, unless stated otherwise, we assume conditions A1-A4 of Section 3. 

There are two issues related to the complexity analysis. Firstly, since / is assumed 
continuous, the entire field cannot be generated in a (discrete) computer and so the 
algorithm used in the discrete case needs adaptation. Once this is done, we need to 
carry out an appropriate variance analysis. 

Developing an estimator directly applicable to the continuous field will be car- 
ried out in Section 6.1. This construction will not only be useful when studying 
the performance of a suitable discretization, but will also help to explain some of 
the features of our discrete construction. Then, in Section 6.2, we introduce a dis- 
cretization approach and study the bias caused by the discretization. In addition, 
we provide bounds on the variance of this discrete importance sampling estimator. 

6.1. A Continuous Estimator. We start with a change of measure motivated 
by the discrete case in Section 5. A natural approach is to consider an importance 
sampling strategy analogous to that of Algorithm 5.3. The continuous adaptation 
involves first sampling Tb according to the probability measure 

E[m{Ab)\ 

where Ab = {t £ T : f (t) > b}. The idea of introducing Tb in the continuous setting 
is not necessarily to locate the point at which the maximum is achieved, as was the 
situation in the discrete case. Rather, Tb will be used to find a random point which 
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has a reasonable probability of being in the excursion set Ab- (This probability will 
tend to be higher if / is non- homogenous.) This relaxation will prove useful in the 
analysis of the algorithm. Note that t;,, with the distribution indicated in equation 
(6.1), has a density function (with respect to Lebesgue measure) given by 



hit) 

and that wc also can write 



Pjfjt) > b) 
E[m{At)] ■' 



E[m{Ab)]=E tif{t)>b)dt^ P{f{t)>b)dt^m{T)P{f{U)>h), 

JT JT 

where U is uniformly distributed over T . 

Once Tb is generated, the natural continuous adaptation corresponding to the 
strategy described by Algorithm 5.3 proceeds by sampling / conditional on /(t;,) > 
h. Note that if we use Q to denote the change-of-measure induced by such a con- 
tinuous sampling strategy, then the corresponding importance sampling estimator 

takes the form 

- _dP _ E[m {Ab)] 

' dQ m{Ab) 

The second moment of the estimator then satisfies 

(6.2) E'iliLbf] = E {Lb, Ab^0)= E[m{Ab)]P (/* > b) E[m{Ab)-^\Ab ^ 0]. 

Unfortunately, it is easy to construct examples for which _E'^[(Zb) ] is infinite. 
Consequently, the above construction needs to be modified slightly. 

Elementary extreme value theory considerations for the maximum of finitely 
many Gaussian random variables, and their analogues for continuous fields, give 
that the overshoot of / over a given level b will be of order Q{l/b). Thus, in order 
to keep Tb reasonably close to the excursion set we shall also consider the possibility 
of an undershoot of size Q (1/5) right at Tb- As we shall see, this relaxation will allow 
us to prevent the variance in (6.2) becoming infinite. Thus, instead of (6.1), we shall 
consider Tb^a/b with density 

..,^ . .,^ Pjfjt) >b-a/b) 
(D-3) rib-a/b[t) = ^, , . 7^, 

E[m[Ab-a/b)\ 
for some a > 0. To ease on later notation, write 

7a, b =b- a/b, T-y^ J, = Tb-a/b- 

Let Q' be the change of measure induced by sampling / as follows. Given t^^ ^, 
sample /(t^„ J conditional on fjr^^ J > 7a. b- In turn, the rest of / follows its con- 
ditional distribution (under the nominal, or original, measure) given the observed 
value /(t^„ b). We then have that the corresponding Radon- Nikodym derivative is 

dQ' m(^^„,J ' 
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and the importance sampling estimator Lj, is 

Note that we have used the continuity of the field in order to write {m (Ab) > 0} = 
{Ab y^ 0} almost surely. The motivation behind this choice lies in the fact that 
since m{Aj^ ^) > m (Ah) > 0, the denominator may now be big enough to control 
the second moment of the estimator. As we shall see, introducing the undershoot 
of size a/b will be very useful in the technical development both in the remainder 
of this section and in Section 7. In addition, its introduction also provides insight 
into the appropriate form of the estimator needed when discretizing the field. 

6.2. Algorithm and Analysis. We still need to face the problem of generating 
/ in a computer. Thus we now concentrate on a suitable discretization scheme, 
still having in mind the change of measure leading to (6.4). Since our interest is 
to ultimately design algorithms that are efficient for estimating expectations such 
as E[r (/) I/* > b], where F may be a functional of the whole field, we shall use a 
global discretization scheme. 

Consider U = (Ui, ...,Um) where Ui are i.i.d. uniform random variables taking 
values in T and independent of the field /. Set Tm ~ {Ui, ....,Um} and Xi = 
X^ {U^) = /([/,) for 1 < i < M. Then X = (Xi,...,X™) (conditional on U) is 
a multivariate Gaussian random vector with conditional means fJ.{Ui) = E{Xi\Ui) 

and covarianccs C {Ui, Uj) — Cov {Xi,Xj\ Ui, Uj). Our strategy is to approximate 
w{b) by 

WM{la,b) = P(max /(i) > -/a,b) = E[P{ max X, > -/a,b\U)]. 

^^Tm l<i<A'I 

Given the development in Section 5, it might not be surprising that if we can ensure 
that M = M (e, b) is polynomial in 1/e and b, then we shall be in a good position 
to develop a FPRAS. The idea is to apply an importance sampling strategy similar 
to that we considered in the construction of L[, of (6.5), but this time it will be 
conditional on U. In view of our earlier discussions, we propose sampling from Q" 
defined via 

M 



"{XeB\U)=J2Pu (0 P[X eB\X,> ja,b, U], 
PiX,>^a,b\U) 



i=l 

where 

Pu{i) 



Ef=iPiXj>7a,b\Uy 
We then obtain the (conditional) importance sampling estimator 

(6-6) Lb(U) - M ..y ^ — • 

T,i=l MX, > Ja,b) 
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It is clear that 

WMila,b)=EQ"[Lb{U)]. 

Suppose for the moment that M, a and the number of rephcations n have been 
chosen. Our future analysis will in particular guide the selection of these parameters. 
Then the procedure is summarized by the next algorithm. 

Algorithm 6.1. The algorithm has three steps: 

Step 1: Simulate U^^' , ..., U^"' which are n i.i.d. copies of the vectorU ~ {Ui, ..., Um) 

described above. 

Step 2: Conditional on each U^^' , for i = 1, ...,n, generate Lj^ (U^^') as described 

by (6.6) by considering the distribution of X^'^U^''') = (xf ^([/f ^), ...,X^)(C7|^^)). 

Generate the X^"^' {U^^') independently so that at the end we obtain that the Lj^ {U^'^') 

are n i.i.d. copies of Lh{U). 

Step 3: Output 



n 

L„(t/w,...,f/(")) = i5;]L«(c/«). 



6.3. Running Time of Algorithm 6.1: Bias and Variance Control. The remain- 
der of this section is devoted to the analysis of the running time of the Algorithm 
6.1. The first step lies in estimating the bias and second moment of Lb{U) under 
the change of measure induced by the sampling strategy of the algorithm, which 
we denote by Q" . We start with a simple bound for the second moment. 

Proposition 6.2. There exists a finite Aq, depending on /iy = maxigy 1/^(^)1 
and a^ ~ maxteT o'^ (t), for which 

E'^"[Lb{U)^] < XoM^P ( max f{t) > b 



Proof. Observe that 

EQ"[UiUy] < E I |f]F(X, >7„,,|t/,)) 

<^( (X]supP(/(C/0>7a.b|t/. =t)j 

= M2nmxP(/(i)>7„,bf 
< XaM^ max P if (t) > bf 

<XoM^p(maxf{t)>b\ , 
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which completes the proof. □ 

Next we obtain a prehminary estimate of the bias. 

Proposition 6.3. For each M >1 we have 

\w (b) - WM {b)\<E [exp {-Mm (^^„ J /m{T)) ; A n T ^ 0] 

+ P ( max / (t) > 7a, b, max f (t) < b 
\ t£T teT 

Proof. Note that 

\w{b) - WM ha,b)\ 

< P(max/(i) < 7a,6,max/(t) > b) + P{max f (t) > 7a,&,max/ (i) < b). 

The second term is easily bounded by 

P(max/(i) > 7a,6,max/(i) < b) < P(max/(i) > 7a.(„max/(t) < b). 
t&TM t£T teT ' teT 

The first term can be bounded as follows. 



P ( max / (t) < ^a.b, max/ (t) > b 1 

\*GTj,f teT J 



< E 

< E 



M 



iP[.fm<7aAf]y UAtnT^0) 



, A/ 



(l-m(A^„J/m(r)) ■AknT^0) 
< E [exp (-A/m(A^^ J/m (T)) ; A n T 7^ 0)] . 
This completes the proof. □ 

The previous proposition shows that controlling the relative bias of L(,(U) re- 
quires finding bounds for 

(6.7) E [exp(-A/m(A^_ J/m(T)); ^ n T 7^ 0] 
and 

(6.8) P(max / (i) > ja,b, max / (t) < 6), 

and so we develop these. To bound (6.7) wc take advantage of the importance 
sampling strategy based on Q' introduced earlier in (6.4). Write 

(6.9) E [exp (-A'/m(A^ J/jTi (T)) ;?7i(A) > O] 
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Furthermore, note that for each a > we have 

„, /exp (-Mm (A^ ^ Im (T)) \ 

(6.10) eQ {^^ ;/° / " ;m{Ab)>0] 

<a~'^exp{~Ma/m{T)) 

The next result, whose proof is given in Section 6.4, gives a bound for the above 
expectation. 

Proposition 6.4. Let P be as in Conditions A2 and A3. For any v > 0, there 
exist constants k, A2 G (0, 00) (independent of a ^ (0, 1) and b, but dependent on v) 
such that if we select 

and define W such that P [W > x) = exp (— x'^/'') for x >0, then 

^^ / exp(-M„(A )/^(r)) ^ \ 

2 m(r) fb\'''/^ 

The following result gives us a useful upper bound on (6.8). The proof is given 
in Section 6.5. 

Proposition 6.5. Assume that Conditions A2 and A3 are in force. For any 
V > 0, let p = 2d/ P -\- dv + 1, where d is the dimension of T . There exist constants 
&o,A € (0,00) (independent of a but depending on ^t ~ nraxtgy |/^ (01; '^t ~ 
max-teT o'^ (t), v, the Holder parameters {3, and kh) so that for all b > bg > 1 we 
have 



(6.12) P niax/(t) <b + a/b\ max f{t) >b] < Xabr 

Consequently, 

P ( max / (t) > ja,b, max f (t) < b 

= P f max/ (i) < b\ max fit) > -/a A P f ma^/(i) > 7a„ 

<\abPpimaxf{t) > 7a,;, j . 
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Moreover, 



P ( max/(i) > 7a,6 1 (1 - XabP) < P ( max/(<) > b 

Propositions 6.4 and 6.5 allow us to prove Theorem 3.1, which is rephrased in 
the form of the following theorem, which contains the detailed rate of complexity 
and so the main result of this section. 

Theorem 6.6. Suppose f is a Gaussian random field satisfying conditions Al- 
A4 in Section 3. Given any v > Q, put a = e/ {AXb^) (where A and p as in Proposition 
6.5), and a^^ = K'^^'^{b/a)^'^+'"^'^/^ Then, there exist c,eo > such that for all 

£ < eo, 

(6.13) \wib)~WMila,b)\<wib)e, 

if M = \ce^^{b/aY^^ ' '^\ . Gonsequently, by our discussion in Section 2 and the 
bound on the second moment given in Proposition 6.2, it follows that Algorithm 6.1 

provides a FPRAS with running time O I (A/) x [M) x e^'^5^^ ) . 



Proof. Combining (6.3), (6.9) and (6.10) with Propositions 6.3-6.5 we have that 

\w (b) - WM (b) I 

< a"^ exp {~Ma/m (T)) E [m (A^^,)] 

m{T) fhV'^'^ ^r ,. .. / XabP ' 



Furthermore, there exists a constant K g (0,oo) such that 

Em (^7„ ,) < K maxP (/ it) > b) m (T) < Kw (6) m (T) 

Therefore, we have that 

\w (6) - WM jb) I _i , , , , , ., 

y-T < a Kmil )expi~Ada/m[I )) 

w (o) 



^2 



Xl^/^MKaJ \1-Xabpj 



Moreover, since a = e/(4A6''), we obtain that, for e < 1/2, 
w (o) 



4d//3 

+ [EW']K^^^^[-) +e/2. 
Xn '' M \aj 



miry fb\ 
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From the selection of a, M and 9 it follows easily that the first two terms on the 
right hand side of the previous display can be made less than e/2 for all e < £o by 
taking Sq sufficiently small 

The complexity count given in the theorem now corresponds to the following 
estimates. The factor 0{{M) ) represents the cost of a Cholesky factorization re- 
quired to generate a single replication of a finite field of dimension M. In addition, 
the second part of Proposition 6.2 gives us that 0{M^e~'^S~^) replications arc re- 
quired to control the relative variance of the estimator. □ 

We now proceed to prove Propositions 6.4 and 6.5. 

6.4. Proof of Proposition 6.4. We concentrate on the analysis of the left hand 
side of (6.11). An important observation is that conditional on the random variable 



Tj^ i, with distribution 



Q'(^7a . e 



E [mjAj^., n ■)] 



and, given /(r^^ j), the rest of the field, namely (/ (t) : t E T\{tj^ ^}) is another 
Gaussian field with a computable mean and covariance structure. The second term 
in (6.10) indicates that we must estimate the probability that m{A-y^ ^) takes small 
values under Q'. For this purpose, we shall develop an upper bound for 

(6.14) P (to(A^„ J < y-\miAb) > 0| / (i) = 7^,6 + z/ja^t) , 

for y large enough. Our arguments proceeds in two steps. For the first, in order to 
study (6.14), we shall estimate the conditional mean covariance of {/ (s) : s £ T}, 
given that / (t) = 70,;, + z/ja,b- Then, we use the fact that the conditional field is 
also Gaussian and take advantage of general results from the theory of Gaussian 
random fields to obtain a bound for (6.14). For this purpose we recall some useful 
results from the theory of Gaussian random fields. The first result is due to Dudley 
[14]. 

Theorem 6.7. LetU be a compact subset o/R", and let {fo(t) : t G U} be a 
mean zero, continuous Gaussian random field. Define the canonical metric d on U 
as 

d{s,t)^^E[foit)~fois)]^ 

and put diam ilA) = sup^ ^^^ d (s, t), which is assumed to be finite. Then there exists 
a finite universal constant k > such that 

(.diam(W)/2 

£;[max/o [t)] < K / [log [N {e))Y'^de, 

teu Jo 

where the entropy M {e) is the smallest number of d— balls of radius e whose union 
covers U. 
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The second general result that we shall need is the so-called B-TIS (Borel- 
Tsirelson-IbragnTLOv-Sudakov) inequality [4, 11, 26]. 

Theorem 6.8. Under the setting described in Theorem 6.1, 



whe 



P ( ma^/o (i) - ^[max/o W] > ^'j < cxp (-6V(2a^)) , 



4=maxi?[/2(i)]. 



We can now proceed with the main proof. We shall assume from now on that 
Ty^ i, ~ 0, since, as will be obvious from what follows, all estimates hold uniformly 
over T^^ ^ € T. This is a consequence of the uniform Holder assumptions A2 and 

A3. Define a new process / 

(fit) : i e T) ^ (/ (i) : i € r| / (0) = 7a,^ + z/la,b). 

Note that we can always write f (t) — Jl (t) + g (t), where g is a mean zero Gaussian 
random field on T. We have that 

]1 (t) - Ef{t) ^ii{t)+a (0)"' G (0, t) (7,.b + z/7„,b - /. (0)) , 

and that the covariance function of / is given by 

Cg {s, t) = Cov {g (s) , g (t)) ^C{s,t)~a (0)"' C (0, s) C (0, t) . 

The following lemma describes the behavior of Jl (t) and Cg (t, s). 

Lemma 6.9. Assume that \s\ and \t\ small enough. Then the following three 
conclusions hold. 

(i) There exist constants Aq and Ai > such that 

[fi (t) - (7a,& + z/la,b)\ < Ao Itl*^ + Ai \tf (7a,b + z/-fa,b), 

and for all z e (0, 1) and ')a,b large enough, 

l/X (s) -Ji{t)\< Kfl-7a,b|s - tf- 

Cg (s, t) < 2kh<j (t) a (,s) {\tf + \sf + \t- sf}. 
(Hi) 

Dg {s, t) ^ ^jE{[g{t)^g{s)f) < X^ \t - sf'' . 

Proof. All three consequences follow from simple algebraic manipulations. The 
details are omitted. □ 
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Proposition 6.10. For any v > 0, there exist k and A2, such that for all t (z T, 
y-13/d ^ _a___^ ^ ^ sufficiently small, and z > 0, 



P 



(m(A^„ J-i > 2/, m(A) > I / (t) = -ta,b + zha,b) < cxp (^-X2aY^Vb^) ■ 



Proof. For notational simplicity, and without loss of generality, we assume that 
t = 0. First consider the case that z > 1. Then there exist ci,C2 such that for all 
C2y-/5/'^ < 6-2-'' and z > 1, 



P 



(m {A^^, n T) ' > y, m(A) > 0| / (0) = -faM + z/la^) 

<P( inf /(t)<7a,6|/(0)-7a.b+2/7a,6 

\\t\<ciy-i-/'' 



= P[ inf M(i)+g(t)<7a.b 

|t|<ciy-i/<i 

< P ( inf , g(t) < ^ 



t|<cia-i/d 27a, 6 

Now apply (iii) from Lemma 6.9, from which it follows that Af (e) < c^m{T) / e^'^/ ^ 
for some constant C3. By Theorem 6.7, i?(sup|j|^j,^,y-i/d /(i)) = 0(y~'^/(2rf) Jogy). 
By Theorem 6.8, for some constant C4, 

P (m (^^„ , n T)"' > y, m(Ab) > 0| / (0) = 7^,6 + z/7a,b) 
<p( inf ^ 5(t) < - ^ 



|t|<cii/-i/d 27a, 6 



for C2y~^^'^ < fe"^"" and z > 1. 

Now consider the case z S (0, 1). Let t* be the global maximum of f{t). Then, 

P (m {A^^, n T)~' > y, m{Ab) > 0|/ (0) = ja,b + z/ja^t) 



<P{ inf fit)<Ja,bJit*)>b\fiO)=Ja,b + z/la,b 

\|t-t*|<cij/-i/<* 

<^( sup \fis)~f{t)\>a/b\fiO)=Ja,b + zhaA- 

\|s-t|<ciy-i/d y 

Consider the new field ^(s, i) = g{s) — g{t) with parameter space T x T. Note that 



V^^ar(e(s,i)) = Z?<,(s, i) < Alls - f|'='/2. 

Via basic algebra, it is not hard to show that the entropy of ^(s, t) is bounded by 
A/"^ (e) < c^ra{T x T)/e'^'^/^ . In addition, from (i) of Lemma 6.9, we have 
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,2+1; 



Similarly, for some k > and all y '^/'^ <-^(f) , a < 1, there exists C5 such that 
P (m {A^^, nry' > y, m{Ah) > 0|/ (0) = 7a,6 + z/-/a^b) 

<P( sup \fis)-fit)\>a/b\fiO)^-fa,b + z/-fa,b] 

\|s-t|<cily-l/d J 

= P( sup |e(.,t)|>^| 



< exp — 



c^b'^y- 
Combining the two cases z > 1 and z e (0, 1) and choosing C5 large enough we have 

P (m (A^„_, nPy' > y, m{Ab) > | / (0) = -faM + zf-fa^) 

< exp 



for a small enough and j/^'^/'' < i (|) . Renaming the constants completes the 
proof. D 

The final ingredient needed for the proof of Proposition 6.4 is the following lemma 
involving stochastic domination. The proof follows an elementary argument and is 
therefore omitted. 

Lemma 6.11. Letvi andv2 be finite measures on M. and define 'r]j{x) =/ Vj{ds). 
Suppose that ?/i {x) > rj2{x) for each x > xq. Let {h{x) : x> xq) be a non- 
decreasing, positive and bounded function. Then, 

/■OO 

h (s) vi (dx) > ^{s) V2 (ds) . 

Xo J Xo 

Proof of Proposition 6.4. Note that 
j^Q ^_^ \ i..^)l y >) .^ (^^^ J ^ (0^ a)-m{A,) > 



^. exp,-..^,.. ,/^,.,, ^^ e (OMMAb) > 

V miA^nJ 



(A 
exp {-Mm (A-y„,,) /m (T)) 

/ (t) - la.h + ^)P (r^„ , e dt) P{jaAf (t) - 7a,h] G d^l / (0 > Ta.h) ■ 

/ cxp(-Mm(A^„ J/m(r)) 
z>o,teT ^ m{Aj^, 

f{t)^la.b+^. 

^a.hy 



< sup sf^iA ;/°'\ ^^ m(A^„j€(0,a);m(A)>0 



MONTE CARLO FOR RANDOM FIELDS 23 

Now define Y {b/a) ~ {\)la)^'^I^^X^ ' W with A2 as chosen in Proposition 6.10 
and W with distribution given by P{W > x) = cxp{—x^^'^). By Lemma 6.11, 

/exp(-Afm(yl^ ,)/m(r)) , , ^ ^ ^,^ 

teT V m{A^^aJ 

< E [Y{b/a) eyLpi-MYib/ay^ /m(T));Y{b/a) > a'^] . 

Now let Z be exponentially distributed with mean 1 and independent of Y{b/a). 
Then we have (using the definition of the tail distribution of Z and Chebyshev's 
inequality) 

exp{-MY{b/a)~^/m{T)) = P {Z > MY{b/a)-^ /m{T)\Y (b/a)) 

m (T) Y [b/a) 



< 



M 



Therefore, 



£;[exp {-MY{b/ar^/m (T)) Y{b/a); Y{b/a) > a^^] 
<^E[Y{b/ar;Yib/a)>a-'] 
m{T) /^^y^/^ 

which completes the proof. □ 



6.5. Proof of Proposition 6.5. We start with the following result of Tsirelson 

[27]. 

Theorem 6.12. Let f be a continuous separable Gaussian process on a compact 
(in the canonical metric) domain T . Suppose that Var (/) = a is continuous and 
that cr (i) > for t G T. Moreover, assume that fi = Ef is also continuous and 
li{t) > for all t e T. Define 

a'^ = maxVar(/(i)), 

and set F{x) = P{inaxt^T f{t) < x}. Then, F is continuously differentiable on ]R. 
Furthermore, let y be such that F{y) > 1/2 and define y* by 

F{y) = $(y*). 

Then, for all x > y, 

F'{x) < vp [^ f^l + 2a) + l) (1 + a), 
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where 



x{x - y)yl ' 
We can now prove the following lemma. 

Lemma 6.13. There exists a constant A G (0, oo) independent of a and 6 > 
such that 



.15) P sup f{t) <h + a/b\ sup f{t) >h] <aA 



,P(sup,gj,/(t)>6-l/&) 



teT t€T J P (suptgr / {t) > b) 

Proof. By subtracting init^T 1^ [t) > — oo and redefining the level & to be & — 
uiiteT M (i) "^6 may simply assume that Ef (t) > so that we can apply Theorem 
6.12. Adopting the notation of Theorem 6.12, first we pick bo large enough so that 
F {bo) > 1/2 and assume that b> bo + 1. Now, let y = b - 1/6 and F{y) = ^{y*). 
Note that there exists do G (0, oo) such that Sob < y* < S^^^b for all b > bo- This 
follows easily from the fact that 

logp(sup/(i) > x\ r^ logsupP{/(i) > x} ^. 

iter J teT ^cr^. 

On the other hand, by Theorem 6.12 F is continuously difFcrcntiablc, and so 

r ^ [''+"/'' F'f.TW.T 

(6.16) P sup fit) <b + a/b\ sup f(t) >b\ = -f ;;, •,. . 

UsT teT J P{suptg-r/(i) > 6} 

Moreover, 

F\x)< fl-^f^\\ f^{i + 2a{x)) + l\^{l + a{x)) 

< (1 - $ (y,)) (^{l + 2a (x)) + l) ^(1 + a (x)) 

\ y J V 

^p(msixf{t)>b-l/b\ (^^{l + 2a{x)) + l) —{l + a{x)). 
Therefore, 

fb+a/b 



F'{x)dx 

b 

< P f sup f{t)>b- l/b\ f " f ^(1 + 2a (x)) + 1^ ^(1 + a (x)) dx. 

Wt J Jb \ y J y 

Recalling that a (x) = y^/[x{x — y)yl], we can use the fact that y, > 6ob to conclude 
that if a; S [b,b + a/b] then a (x) < Sq^, and therefore 



/ 



f ^(1 + 2a (x)) + l] ^(1 + a (x)) dx < 4S^\. 

b \ y J y 
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Wc thus obtain that 



f.,^. it^''"F'{x)dx ^ P{sup,^^f{t) >b-l/b} 

^ ' P{sup,gy/(i)>6}- "° P{sup,^Tf{t)>b} ■ 

for any 6 > 69. This incquahty, together with the fact that F is continuously differ- 
entiablc on (—00, cxj), yields the proof of the lemma for 5 > 0. □ 

The previous result translates a question that involves the conditional distri- 
bution of maxigT / (i) near b into a question involving the tail distribution of 
maxjgy / (i). The next result then provides a bound on this tail distribution. 

Lemma 6.14. For each v > there exists a constant C{v) € (0,oo) (possibly 
depending on v > but otherwise independent of b) so that such that 

P ( max/ (t) > M < C(w)fe2''/'^+''"+i mayi P{f (t) > b) 

for all b > 1. 



Proof. The proof of this result follows along the same lines of Theorem 2.6.2 in 
[5]. Consider an open cover of T = ufi^T^{0), where T^{0) = {s : \s - t^\ < 9}. 
We choose ti carefully such that N{6) = 0{9~'^) for 9 arbitrarily small. Write 
f (f) = 9 (t) + M ('^)! where g (t) is a centered Gaussian random field and note, using 
A2 and A3, that 



P max f(t)>b] < P { max q it) > b - n fc) - khO''^ 

Now we wish to apply the Borel-TIS inequality (Theorem 6.8) withZ^ = Ti (9), /o = 
g, d (s, t) ~ E^^'^dg (i) — g (s)] ), which, as a consequence of A2 and A3, is bounded 
above by Cq \t — s\ for some Cq € (0, 00). Thus, applying Theorem 6.7, we have 
that E maXf^Tiid) 9 (t) < Ci6"^/^ log(l/6') for some Ci € (0,oo). Consequently, the 
Borel-TIS inequality yields that there exists C2 (w) G (0,oo) such that for all b 
sufficiently large and 9 sufficiently small we have 

/ A ( (b-fi(t,)-Ci9'^/(^+P-')f\ 
P f max git)>b-ii {t,) - >,hO^ j < C2 {v) exp I -^ ^^^ ^J '- J , 

where aTi = maxtgT.(g) a (i). Now select v > small enough, and set 0'5/(2+/3f) — 
b~^. Straightforward calculations yield that 

P (,max^ f{t)>b^ < P (^^max^ 9 (t) > b - ^i {t,) kh9^^ 

<: r ( ^ ( ^^^JtM\ 
< G3 (v) max exp ^ — 



26 ADLER, BLANCHET, LIU 

for some C3{v) € (0,oo). Now, recall the well known inequality (valid for a; > 0) 
that 

\x x-^ J X 

where = $' is the standard Gaussian density. Using this inequality it follows that 
C4 (u) G (0, 00) can be chosen so that 

max exp(- ^^~^'^y ] < C4 (w) 6maxP (/(<)> 6) 
t^rm ' \ 2a (tf j ~ teT. 

for all 6 > 1. We then conclude that there exists C {v) G (0, 00) such that 

P (max f{t)>b\ <N{e) C4 (v) b max P (/ (t) > b) 

< C0-%iRaxP{f (t) >b) = Cb^d/p+dv+i jnaxP (/ (i) > b) , 

giving the result. □ 

We can now complete the proof of Proposition 6.5. 

Proof of Proposition 6.5. The result is a straightforward corollary of the previous 
two lemmas. By (6.15) in Lemma 6.13 and Lemma 6.14 there exists A G (0,oo) for 
which 



P ( max fit) <b + a/b\ max /(i) > b 

^^^^ P{max,^Tf{t)>b-l/b) 
P (maxteT / (t) > b) 

< aCAh^'^'^+'^''+^ maxtgT P {f jt) > b - l/b) 

P (maxteT / (t) > b) 

< ar:'^i52rf/g+d»+i maxtgr P [f [t) > b - l/b) 
- ' master P (/ {t) > b) 

< aXb^^/f'+''^+\ 

The last two inequalities follow from the obvious bound 



P (^max / (t) >bj> max P (/ (t) > b) 

and standard properties of the Gaussian distribution. This yields (6.12), from which 
the remainder of the proposition follows. D 
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7. Fine Tuning: Twice Differentiable Homogeneous Fields. In the pre- 
ceeding section we constructed a polynomial time algorithm based on a randomized 
discretization scheme. Our goal in this section is to illustrate how to take advantage 
of additional information to further improve the running time and the efficiency of 
the algorithm. In order to illustrate our techniques we shall perform a more refined 
analysis in the setting of smooth and homogeneous fields and shall establish op- 
timality of the algorithm in a precise sense, to described below. Our assumptions 
throughout this section are B1-B2 of Section 3. 

Let C{s — t) = Cov(/(s), f{t)) be the covariance function of /, which we assume 
also has mean zero. Note that it is an immediate consequence of homogeneity and 
differentiability that diC{0) = df^f^C^O) = 0. 

We shall need the following definition. 

Definition 7.1. We call T — {ii, ...,t7\/} C T a 6-regular discretization of T 
if, and only if 

min \ti — tj\ > 9, supmin \ti — t\ < 29. 

Regularity ensures that points in the grid T are well separated. Intuitively, since 
/ is smooth, having tight clusters of points translates to a waste of computing 
resources, as a result of sampling highly correlated values of /. Also, note that 
every region containing a ball of radius 19 has at least one representative in T . 
Therefore, T covers the domain T in an economical way. One technical convenience 
of ^-regularity is that for subsets ACT that have positive Lcbesguc measure (in 
particular ellipsoids) 

#(Anr) m(A) 

hm = — — — , 

A/^oo M m{T) ' 

where here and throughout the remainder of the section #(A) denotes the cardi- 
nality of the set A. 

Let T = {ii, ..., ij\f } be a 0- regular discretization of T and consider 

X = {X,,...,XMf^{f{t,),...,f{tM)f. 

We shall concentrate on estimating wm (b) = P (uiaxi<i<M Xi > b). The next result 
(which we prove in Section 7.1) shows that \i9 = e/b then the relative bias is O (a/e). 

Proposition 7.2. Suppose f is a Gaussian random field satisfying conditions 
Bl and B2. There exist cq, C\, bo, and Eq such that, for any finite e/b-regular 
discretization T of T , 

(7.1) P(sup f{t) < b\ sup fit) >b)< coa/e, and #(T) < cim{T)e-'^b'^, 

for all e £ (0, Eq] and b > bo. 

Note that the bound on the bias obtained for twice differentiable fields is much 
sharper than that of the general Holder continuous fields given by (6.13) in Theorem 
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6.6. This is partly because the conditional distribution of the random field around 
local maxima is harder to describe in the Holder continuous than in the case of 
twice differentiable fields. In addition to the sharper description of the bias, we 
shall also soon show in Theorem 7.4 that our choice of discretization is optimal in a 
cetain sense. Finally, we point out that the bound of ^Je in the first term of (7.1) is 
not optimal. In fact, there seems to be some room of improvement, and we believe 
that a more careful analysis might yield a bound of the form cqE^ . 

We shall estimate wm (b) by using a slight variation of Algorithm 5.3. In partic- 
ular, since the X^'s are now identically distributed, we redefine Q to be 



M 

(7.2) g(xei3) = ^— P[XeB|X, >6-l/5]. 

Our estimator then takes the form 

. X ~ M X PlXi>b-l/b) , 
7.3 U = — ^, !^^^ ^^1 max X, > h). 

Clearly, we have that E'^{Li,) = wm (b)- (The reason for for subtracting the fcator 
of 1/6 was explained in Section 6.1.) 

Algorithm 7.3. Given a number of replications n and an s/b-regular dis- 
cretization T the algorithm is as follows: 

Step 1: Sample X^^' , ..., A"*-"^ i.i.d. copies of X with distribution Q given by (7.2). 
Step 2: Compute and output 



1 " ~ 

n ^ — ^ 



n 



{V) 
b ' 



where 



E,ii l(^f > b - 1/b) i<^<^^ ' 



Theorem 7.5 later guides the selection of n in order to achieve a prescribed 
relative error. In particular, our analysis, together with considerations from Section 
2, implies that choosing n = O (e^^(5^^) suffices to achieve e relative error with 
probability at least 1 — S. 

Algorithm 7.3 improves on Algorithm 6.1 for Holder continuous fields in two 
important ways. The first aspect is that it is possible to obtain information on the 
size of the relative bias of the estimator. In Proposition 7.2, we saw that in order 
to overcome bias due to discretization, it suffices to take a discretization of size 
M = #(r) = 8(6''). That this selection is also asymptotically optimal, in the sense 
described in the next result, will be proven in Section 7.1. 
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Theorem 7.4. Suppose f is a Gaussian random field satisfying conditions Bl 
and B2. If 6 e (0, 1), then, as b^ oo, 

sup P(sup f{t) > b\ sup f{t) > 6) ^ 0. 

#(T)<&«<' t£T *6T 

This result implies that the relative bias goes to 100% as 6 — ?► oo if one chooses a 
discretization scheme of size O (b^'^^ with 6 G (0, 1). Consequently, d is the smallest 
power of b that achieves any given bounded relative bias, and so the suggestion 
above of choosing M ~ O (b'^) points for the discretization is, in this sense, optimal. 

The second aspect of improvement involves the variance. In the case of Holder 
continuous fields, the ratio of the second moment of the estimator and if (b) was 
shown to be bounded by a quantity that is of order 0{M^). In contrast, in the 
context of smooth and homogeneous fields considered here, the next result shows 
that this ratio is bounded uniformly for b > bo and M = #(T) > cb'^. That is, the 
variance remains strongly controlled. 

Theorem 7.5. Suppose f is a Gaussian random field satisfying conditions Bl 
and B2. Then there exist constants c, bo, and Eq such that for any e/b-regular 
discretization T of T we have 

sup — — —^ < sup — —T 7- < c. 

b>bo.eel0.eo] P (suPteT / (^) > o) 6>bo,ee[0,eo] P V'^Y>tef f W > ^) 

for some c £ (0, oo) . 

The proof of this result is given in Section 7.2. The fact that the number of 
replications remains bounded in 6 is a consequence of the strong control on the 
variance. 

Finally, we note that the proof of Theorem 3.3 follows as a direct corollary of The- 
orem 7.5 together with Proposition 7.2 and our discussion in Section 2. Assuming 
that placing each point in T takes no more than c units of computer time, the total 
complexity is, according to the discussion in section Section 2, O (nM^ + Af) = 
O {e^^S^^AP + M). The contribution of the term M^ = O (e^^b^) comes from 
the complexity of applying Cholesky factorization and the term M = 0{e^^b) 
corresponds to the complexity of placing T. 

Remark 7.6. Gondition B2 imposes a convexity assumption on the boundary 
ofT. This assumption, although convenient in the development of the proofs of The- 
orems 7.4 and 7.5, is not necessary. The results can be generalized, at the expense 
of increasing the length and the burden in the technical development, to the case 
in which T is a d- dimensional manifold satisfying the so-called Whitney conditions 

(UD- 

The remainder of this section is devoted to the proof of Proposition 7.2, Theorem 
7.4 and Theorem 7.5. 
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7.1. Bias Control: Proofs of Proposition 7.2 and Theorem 7.4- We start with 
some useful lemmas, for all of which we assume that Conditions Bl and B2 are sat- 
isfied. We shall also assume that the global maximum of / over T is achieved, with 
probability one, at a single point in T. Additional conditions under which this will 
happen can be found in [4] and require little more than the non-degeneracy of the 
joint distribution of / and its first and second order derivatives. Of these lemmas. 
Lemma 7.7, the proof of which we defer to Section 7.3, is of central importance. 

Lemma 7.7. For any oq > 0, there exists c* , S* , bo, and 6q (which depend on 
the choice of ao), such that for any s e T, a £ (0, oq), 6 g (0, So), and b > bo 



5 



P{_ min ^ fit) < b\f{s) = b + a/b, s e C) < c* cxp ^ , , 



where C is the set of local maxima of f. 

The next lemma is a direct corollary of Lemma 7.7. 

Lemma 7.8. Let t* be the point in T at which the global maximum of f is 
attained. Then, with the same choice of constants as in Lemma 7.7, for any a e 
(0, ao), S € (0, 6o), and b > bo, we have 

P{ min /(t)<6|/(r) = 6 + a/6)<2c*cxpf-^ 
Proof. Note that, for any s £ T, 



P[ min f{t)<b\f{s)^b + a/b,s = r 

^\t\<Sa/b 

_ P (min|t|<5a/b f{t) < b, f{s) = b + a/b, s = t*) 



< 



P{f{s) = b + a/b,s = t*) 

P (min|f|<aa/b f{t) < b, f{s) = b + a/b, s £ £) 

P{fis) = b + a/b,s = t*) 

= P( min^ ^ fit) < 6|/(.) ^ b + a/b, s e C) ^^^Ij'] = ^ ^''^ ^ ^ f. . 
t-s|<dafc-i P (j[s) ^ b + a/b,s ^ t*) 

<c*cxp('-^^ Pim^b + a/b,seC) 



S^J P{f{s)^b + a/b,s = t*)' 

Appling this bound and the fact that 

Pifis)^b + a/b,seC) 
P{f{s) = b + a/b,s = t*) ^ 

as & — > cx), the conclusion of the Lemma holds. □ 
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The following Icmnia gives a bound on the density of sup^gy /(Oj which will be 
used to control the size of overshoot beyond level b. 

Lemma 7.9. Letpf-{x) be the density function of sup^^j^ f (t) . Then there exists 
a constant Cf and bo such that 

Pf*{x)<Cf,x''+'P{f{0)>x), 

for all X > bo- 

Proof. Recalling (1.3), let the continuous function p^{x), a; € M, be defined by 
the relationship 

/■OO 

E{x{{teT: f{t)>b}))= / p^{x)dx, 

Jb 

where the left hand side is the expected value of the Euler-Poincare characteristic 
of A),. Then, according to Theorem 8.10 in [7], there exists c and S such that 

Ip^'ix) - pf,ix)\ < cP{f{0) > {1 + 5)x), 

for all a; > 0. In addition, thanks to the result of [4] which provides J^ p^{x)dx in 
closed form, there exists cq such that, for all x > 1, 

p^'ix) < cox^+'PifiO) > x). 

Hence, there exists c/. such that 

Pf,{x) < cox''+^P{f{0) > x) + cP{f{0) > (1 + S)x) < c/.a;'^+ip(/(0) > x), 

for all a; > 1. □ 

The last ingredients required to provide the proof of Proposition 7.2 and Theorem 
7.4 are stated in the following well known Theorem, adapted from Lemma 6.1 and 
Theorem 7.2 in [19] to the twice differentiable case. 

Theorem 7.10. There exists a constant H (depending on the covariance func- 
tion C , such that 

(7.4) P(sup/(t) >b) = (l + o{l))Hm{T)b''P{f{0) > b), 

teT 

as 6 — > cxj . 

Similarly, choose 6 small enough so that [0, S]"^ C T, and let Aq = [0, b~^]'^. Then 
there exists a constant Hi such that 

(7.5) P{ sup fit) > 6) = (1 + o(l))i/iP(/(0) > 6), 

tGAo 

as 6 — > cx) . 
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Wc now are ready to provide the proof of Proposition 7.2 and Theorem 7.4. 
Proof of Proposition 7.2. The fact that there exists ci such that 

#(r) < cim{T)e-~%'^ 

is immediate from assumption B2. Therefore, we proceed to provide a bound for 
the relative bias. We choose eq < Sq {So is given in Lemmas 7.7 and 7.8), and note 
that, for any e < Eq, 

P(sup/(t)<6|sup/(i)>&) 

<F(sup/(i) < b + 2y/I/b\sup fit) > b) + P{sup fit) <&|sup/(i) >b + 2y^/b). 
teT teT t^f teT 

By (7.4) and Lemma 7.9, there exists C2 such that the first term above can bounded 

by 

P(sup fit) <b + 2y/e/b\ sup fit) >b)< c^^fe. 
teT teT 

The second term can be bounded by 

P(sup fit) < b\ sup fit) > b + ^sTejb) 
tef t^T 

<P( sup fit) <b\snp fit) >b + 2^/1) 

t-t*|<2e6-i teT 

<2c*exp(-re"i). 

Hence, there exists cq such that 

P(sup/(t) < &|sup/(t) >b)< C2^/e + 2ce~^'l' < coVe, 
tef *eT 

for all e e (0,eo) ^ 



Proof of Theorem 7.4. We write 6* = 1 - 3(5 G (0, 1). First note that, by (7.4), 
P(sup fit) > b + fo^^'-i I sup fit) > fe) ^ 0, 

T T 

as 6 — ^ oo. Let t* be the position of the global maximum of / in T. According to the 
exact Slepian model in Section 7.3 and an argument similar to the proof of Lemma 

7.8. 

(7.6) P( sup fit)>b\b< fit*)<b + b^^-^)^0, 

|t-f|>b2«-l 

as & — >■ oo. Consequently, 

P( sup fit) > b\ sup fit) >b)^0. 

|t-i*l>b2-5-i T 
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Let 



B(r,&2'5-i) = u,gj.S(i,&2'5-i) 



We have, 



P(sup/(i)>fe|sup/(i)>6) 

f T 

<P{ sup f{t)>b\suvf{t)>b) 

|i-f|>62«-i T 

+ P(sup fit) > b, sup fit) < b\ sup fit) > b) 

f |t-f|>&2<s-i T 

< o(l) + Pit* e B(f ,62*-i)| sup/(i) > b) 

T 

<o(l)+P( sup /(i)>fe|sup/(t)>6). 

Since #(T) < 6(1-3*')'', we can find a finite set T' = {t[, ...,t\} C T and let Afe = 
t'k + [0,b-^] sucli tliat^/ = 0(6(1-'')'') ^^^ Bif,b^^-^) c ui^iAfc. Tlie clioice of 
I only depends on #(r), not the particular distribution of T. Therefore, applying 
(7.5), 

sup Pi sup fit) > 6) < Oib^^-^'>'^)PifiO) > 6). 

This, together with (7.4), yields 

sup P( sup /(i) > 6| sup/(i) > 6) < 0(6""''') = o(l), 

for 6 > 6o, which clearly implies the statement of the result. D 



7.2. Variance Control: Proof of Theorem 7.5 . We proceed directly to the proof 
of Theorem 7.5. 
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Proof of Theorem 7.5. Note that 



P2 (sup.gj. / (t) > b) 



EiLb) 



P^sup^^r f (t) > b) 



E 



MxP{Xi>b~l/b) 



E"=il(^.>fc- 



jl ; maxj Xj > b 



P^supt^r f (t) > b) 



P2(sup,gr/(i)>6) 



£; 



A/P(Xi>fc-l/b)l(i] 



E;^ii(-y,>fe'^^T7fer 



■'X,>b) 



SUPtgT f {t)>b 



E 



P (supt^r f (t) > b) 
Ml (maxj Xj > 6) 



EUi(^j>^-iA) 



sup/ (t) > 6 — -7-7 T-. 



The remainder of the proof involves showing that the last conditional expectation 
here is of order 0{b'^). This, together with (7.4), will yield the result. Note that for 
any A (6, e) such that A {b, e) = 0(A/) uniformly over b and e we can write 



(7.7) 



E 



M X t (maxj Xj > b) 



sup/(i) > b 
teT 1 



Eti^{x,>b-r 

M X 1 (j2fU 1 (^J > b - 1/6) > 



< E 



M 
A(b,e 



Et^H-.>b-i: 



sup / (t) > b 



E 



M X 1 (1 < E-UtjX, >b-l/b) < ^) 



E^Ut{x,>b-l) 



sup f{t)>b 



We shall select A{b,e) appropriately in order to bound the expectations above. By 
Lemma 7.8, for any 4e < 5 < Sq, there exist constants c' and c" € (0, 00), such that 



sup / (i) > b 

teT 



c* exp --^ > P[ min f{t)<b- 1/5 
5^ J ~ \\t-t'\<s/b 



M 



> P I ^ 1 (Xj > 6 - 1/b) < di'^je'^ 



(7.8) 



> P 



M 



^M 



> 



b'^c" 



Y.Unx,>b-i/b) 



sup fit) >b 
teT 



sup f{t)>b 

teT I 
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The first inequality is an application of Lemma 7.8. The second inequality is due 
to the fact that for any ball B of radius 4e or larger, #(T D B) > c"^e~'^ for some 
c' > 0. Inequality (7.8) implies that for all x such that b'^c" /5f^ <x < ¥c" /[A'^e'^], 
there exists 5** > such that 



M 



T-Ut{X,>b-\)¥ 



> X 



sup / (t) > 6 j < c* exp {-5**x'^''^'\ 



Now let A{b^e) = b'^c" /{4'^e'^) and observe that by the second result in (7.1) we 
have A {b, e) = (M) and, moreover, that there exists C3 such that the first term 
on the right hand side of (7.7) is bounded by 



(7.9) 



E 



^M 



M X 1 J2i=i 1 (^j > ^ - 1/&) > 



M 



^M 



E;Utix,>b-i/b) 



SUpf{t)>b\ <C3&'' 
T 



Now we turn to the second term on the right hand side of (7.7). We use the fact 
that M/A{b, e) < c'" £ (0, 00) (uniformly as 6 — > 00 and e — > 0). There exist C4 and 
C5 such that, for e < S0/C4, 



(7.10) 



E 



Ml i<E-iii(^.>^^-i)< 



M 
A{b,s) 



E,Li^{x,>b 



< MP 



< MP 



Y,t[X,>b 



min f it) < b 

\t~t*\<Cie/b 



<C" 



1/6 



sup/(i) > b 

T 



sup f{t)>b 
T 



sup f{t)>b 



<cim(r)6'*£"''cxp 



,2^2 



rt£ 



< c^h"-- 



The second inequality holds from the fact that if X) i=i 1 {Xj > b — ^) is less than 
c'", then the minimum of / in a ball around the local maximum and of radius 046/6 
must be less than 6 — 1/6. Otherwise, there are more than c'" elements of T inside 
such a ball. The last inequality is due to Lemma 7.8 and Theorem 7.2. 

Putting (7.9) and (7.10) together we obtain, for all e/b -regular discretizations 
with e < £0 = min(l/4, 1/04)60, 



eQlI 



P {supt^T f it) > b) 



< E 



Ml (maxj Xj > b) 



^M 



j:ut{x,>b 



sup/(t) >6 



P{Xi > 6 - 1/6) 
P(sup,gj,/(t)>6) 



< (C3+C5) 



¥P{X^ >6-1/6) 



P(sup,ej,/(i)>6)- 
Applying now (7.4) and Proposition 7.2, we have that 

P(sup,g^/(t)>6) 



P(sup/(i)>&)< 



1 - CQy/e 
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wc have _ 

sup -. y- r- < OO 

h>6o,ee[0,eo] P (suPtgf / (0 > b) 

as required. □ 



7.3. A final proof: Lemma 7.7. Without loss of gcnerahty, we assume that the 
field has mean zero and unit variance. Before getting into the details of the proof 
of Lemma 7.7, we need a few additional lemmas, for which we adopt the following 
notation: Let Ci and Cij be the first and second order derivatives of C, and define 
the vectors 

Mt)^{-Ci{t),...,-Cd{t)), 

fi2{t) = vech {{Cij{t),i = l,...,d,j = i,...,d)). 

Let /'(O) and /"(O) be the gradient and vector of second order derivatives of / at 0, 
where /"(O) is arranged in the same order as /i2(0). Furthermore, let /xo2 = /^Jo be 
a vector of second order spectral moments and fj,22 a matrix of fourth order spectral 
moments. The vectors //02 and ^22 are arranged so that 

1 /X02 
A 

M20 /i22 

is the covariance matrix of (/(O), /'(O), /"(O)), where A = (— Cij(O)). It then follows 
that 

/^20 = M22 — M20M02 

be the conditional variance of /"(O) given /(O). The following lemma, given in [5], 
provides a stochastic representation of the / given that it has a local maxima at 
level u at the origin. 

Lemma 7.11. Given that f has a local maximum with height u at zero (an 
interior point of T), the conditional field is equal in distribution to 

(7.11) /„(t) ^ uC (t) ~ WuP'^ (t) + g (t) . 

g{t) is a centered Gaussian random field with covariance function 



1 M02 A V ^W \ ..f.^^~l..-T. 



and Wu is a -^ — - random vector independent of git) with density function 
(7.12) ^u{w) oc |det(r*(w) - uA) | exp ( -;rW^/i2".oW | t{r*{w) - uA e Af) , 
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where r*{w) is a dx d symmetric matrix whose upper triangular elements consist of 
the components ofw. The set of negative definite matrices is denoted by M . Finally, 
j3{t) is defined by 

{a{t),p{t))^{C{t).m{t)){ ^ 7 ' 

The following two technical lemmas, which we shall prove after completing the 
proof of Lemma 7.7, provide bounds for the last two terms of (7.11). 

Lemma 7.12. Using the notation in (7.11), there exist 5q, ei, ci, and bo such 
that, for any u > b > bo and S G (0, So), 



P sup \Wul3 ' {t)\ > — ) < ciexp 

\\t\<Sa/b 



^ <crexp -^ 



Lemma 7.13. There exist c, S, and Sq such that, for any S G (0,(5o), 
P ( max \g{t)\ > -- ) < ccxp -— . 



Proof of Lemma 7.7 Using the notation of Lemma 7.11, given any s e £ for 
which f{s) = b, we have that the corresponding conditional distribution of s is that 
of fb{- — s). Consequently, it suffices to show that 

P ( min fb{t) <b-a/b] <c*exp(~-r 

\\t\<Sa/b ) \ S^ 

We consider firstly the case for which the local maximum is in the interior of T . 
Then, by the Slepian model (7.11), 

fb(t)=bC(t)-Wb^'^ {t)+g(t). 

We study the three terms of the Slepian model individually. Since, 

C(t) = l-<^A< + o(|t|^) , 

there exists a Eq such that 

^cit)>b-l^, 
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for all \t\ < eoy/a/b. According to Lemmas 7.12 and 7.13, for 6 < lnm{eo/^/ao,6o), 



P min fu(t) <b - a/b 

\\t\<Sa/b 



-^i.V''^\^9it)\>^A+p( sup \Wtl3'^{t)\>^ 

\t\<Sa/b AbJ \\t\<Sa/b 46^ 

£ib^\ 




ci cxp 



(54 



for some c* and S*. 

Now consider the case for which the local maximmn is in the (d— l)-dimcnsional 
boundary of T. Due to convexity of T and without loss of generality we assume 
that the tangent space of dT is generated by d/dt2, ...yd/dtd, the local maximum 
is located at the origin, and T is a subset of the positive half-plane ti > 0. That 
these arguments to not involve a loss of generality follows from the arguments on 
pages 291-192 of [4], which rely on the assumed stationarity of / (for translations) 
and the fact that rotations, while changing the distributions, will not affect the 
probabilities that we are currently computing.) 

For the origin, positioned as just described, to be a local maximum it is necessary 
and sufficient that the gradient of / restricted to dT is the zero vector, the Hessian 
matrix restricted to dT is negative definite, and 9i/(0) < 0. Applying a version of 
Lemma 7.11 for this case, conditional on /(O) = u and being a local maximum, 
the field is equal in distribution to 

(7.13) uC{t) - WuP'^it) + Mi(t)A-i ^^^ Q^ _^^Q-^T ^ ^^^^^ 

where Z < corresponds to dif (0) and it follows a truncated (conditional on the 
negative axis) Gaussian random variable with mean zero and a variance parameter 
which is computed as the conditional variance of dif (0) given {82/ (0) , ..., ddf (0)). 
The vector W^ is a {d{d + l)/2)-dimcnsional random vector with density function 

'ipuiw) ex |det(r*(w) — uA)| exp(— -ti;^^2^j!)w)l(w* — uA e A/"), 

where A is the second spectral moment of / restricted to dT and r*(w) is the 
(rf — 1) X (d — 1) symmetric matrix whose upper triangular elements consist of the 
components of w. In the representation (7.13) the vector Wu and Z are independent. 
As in the proof of Lemma 7.12, one can show that a similar bound holds, albeit 
with with different constants. Thus, since /xi (t) = O (t), there exist c" and S" such 
that the third term in (7.13) can be bounded by 



P\ max 

\t\<Sa/b 



Ml (t) A-i (Z, 0, ..., 0)^1 > a/ {4b)] < c" cxp (-^-^\ 
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Consequently, we can also find c* and 6* such that the conclusion holds, and we 
are done. □ 

We complete the paper with the proofs of Lemmas 7.12 and 7.13. 

Proof of Lemma 7.12. It suffices to prove the Lemma for the case a = 1. Since 

fu{0)=u = u- Wul3'^ (0) + g (0) , 

and Wu and g are independent, f3 (0) = 0. Furthermore, since C" (t) ~ O (t) and 
^2 (^) = O {t), there exists a cq such that \f3{t)\ < co|tp. In addition, Wu has density 
function proportional to 

■ipuiw) oc |det(r*(w) - uA)\ exp(--u;^^^ou;)l {w* - uA e J\f) . 

Note that det(r*(w) — uA) is expressible as a polynomial in w and u, and there 
exists some Sq and c such that 



det (r*(u') - uA) 



< c 



det (-uA) 
if \w\ < Equ. Hence, there exist £2, C2 > 0, such that 

■0« (w) < -0 (w) : = C2 exp f - -eaw^^j^-o^ ) ' 

for all M > 1. The right hand side here is proportional to a multivariate Gaussian 
density. Thus, 



Pi\Wu\>x)= Mw)dw< ^Pu{w)dw = C3P{\W\>x), 

J \w\>x J\w\>x 

where M^ is a multivariate Gaussian random variable with density function propor- 
tional to ■0- Therefore, by choosing ei and ci appropriately, we have 

for aU u > 5. □ 



Proof of Lemma 7.13. Once again, it suffices to prove the Lemma for the case 
a = 1. Since 

M0)=b = b-Wbl3^i0)+gi0), 

the covariance function (7 (s, t) : s,t E T) of the centered field g satisfies 7(0, 0) = 0. 
It is also easy to check that 

dsj{s,t) = o{\s\ + |i|), da{s,t) = o{\s\ + |t|). 
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Consequently, there exists a constant c^ G (0,oo) for which 

7(s,t)<c^(|s|2 + |t|2), ^{s,s)<c^\s\\ 

We need to control the tail probability of supu|<^/f, |,9(0I- For this it is useful to 
introduce the following scaling. Define 

b fSt 

Then sup^^^^g^^g (t) > ^ if and only if sup|t|<i gs (t) > ^. Let 

(Js{s,t) :^ E {gs (s) , gs (t)) . 

Then, 

Supcr5(s, s) < C^. 

Because 7(5, t) is at least twice differentiable, applying a Taylor expansion we easily 
see that the canonical metric dg corresponding to gs (s) (cf. Theorem 6.7) can be 
bounde as follows: 



dl{s,t)^E{gs{s)-gs{t)f^^' 



6s Ss\ /6t St\ fSs 6t 



<c\s-tf , 

for some constant c G (0, 00). Therefore, the entropy oi gs, evaluated at d, is bounded 
by KS~'^ for any S > and with an appropriate choice of K > 0. Therefore, for all 
S <So, 

P{ sup \g{t)\ > 1) = P(sup ,9, (i) > 1) < Crf<S-'^-''exp(-— i— ), 
\t\<s/b 46 |4|<i 4d 16c^d^ 

for some constant c^ and 77 > 0. The last inequality is a direct application of The- 
orem 4.1.1 of [4]. The conclusion of the lemma follows immediately by choosing c 
and 6 appropriately. □ 
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